Serialize engine config in new pdsh benchmark CLI#22365
Serialize engine config in new pdsh benchmark CLI#22365TomAugspurger wants to merge 13 commits intorapidsai:mainfrom
Conversation
This updates the cudf-polars benchmarks CLI to serialize the engine configuration. This will let us see what options were *actually* used.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This adds back a `--spill-to-pinned-memory` CLI option and threads it through for the new frontends.
|
This needs more work. We need to ensure that the |
|
I've updated this to serialize And here are some example values of Unfortunately, I don't think rapidsmpf currently has a way to view the actual values the options resolve to without explicitly setting or using them, which is why the dask/ray examples only have the |
|
@madsbk do you have any ideas about how to handle #22365 (comment)? Specifically,
|
I don’t think it is possible with the current design, where each call site parses the string value and defines its own default.
An empty string means that no one has accessed and parsed that option yet. |
|
OK, I think this might be all we can do on the cudf-polars side then. I've opened rapidsai/rapidsmpf#1014 to track what I think needs to happen on the rapidsmpf side, and then hopefully we can get back to a state where all of the actual runtime options are being recorded. For now, I'll merge this so that the non-rapidsmpf options are being recored properly. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR persists RapidsMPF options as instance attributes on streaming engine backends (SPMD, Ray, Dask), updates benchmark serialization to accept ChangesRapidsMPF Options Persistence and Serialization
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@python/cudf_polars/cudf_polars/experimental/benchmarks/utils_new_frontends.py`:
- Around line 597-601: The serialized RapidsMPF options are stale because
serialize() reads engine.rapidsmpf_options while the frontend _reset() path
rebuilds the live Context from new Options without updating that cached
attribute; fix by ensuring the source of truth is consistent — either update
self.rapidsmpf_options inside every frontend _reset() (the methods that rebuild
Context/Options) to reflect the newly resolved Options, or change serialize() to
read/serialize from the same freshly resolved Options object used in the reset
path (the local variable passed into Context rebuild) instead of
engine.rapidsmpf_options; reference methods: serialize(), _reset(),
engine.rapidsmpf_options, self.rapidsmpf_options, Context, Options.
- Around line 1201-1203: The call to _finalize_benchmark_run(...) is happening
after exiting the with RayEngine(...) block so engine shutdown has already
occurred; move the _finalize_benchmark_run(args, run_config,
validation_failures, query_failures, engine=engine) invocation back inside the
with RayEngine(...) context so RunConfig.serialize(engine=engine) (and any
engine-dependent recording) runs while the engine is still live and before
shutdown.
- Around line 1263-1265: The call to _finalize_benchmark_run(...) is happening
after the DaskEngine context has exited so the engine is already shut down when
run_config.serialize(engine=engine) needs it; move the
_finalize_benchmark_run(args, run_config, validation_failures, query_failures,
engine=engine) invocation so it executes inside the with DaskEngine(...) block
(i.e., before the context manager exits) so run_config.serialize can access the
live engine during Dask runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4f40fd22-68af-45ff-b378-7e9bda8ca9e6
📒 Files selected for processing (5)
python/cudf_polars/cudf_polars/experimental/benchmarks/utils_new_frontends.pypython/cudf_polars/cudf_polars/experimental/rapidsmpf/frontend/core.pypython/cudf_polars/cudf_polars/experimental/rapidsmpf/frontend/dask.pypython/cudf_polars/cudf_polars/experimental/rapidsmpf/frontend/ray.pypython/cudf_polars/cudf_polars/experimental/rapidsmpf/frontend/spmd.py
| rapidsmpf_options = engine.rapidsmpf_options.get_strings() | ||
| result["config_options"] = { | ||
| "config_options": dataclasses.asdict(config_options), | ||
| "rapidsmpf_options": rapidsmpf_options, | ||
| } |
There was a problem hiding this comment.
Keep serialized RapidsMPF options in sync with engine resets.
serialize() now trusts engine.rapidsmpf_options, but the frontend _reset() paths rebuild their live Context from new Options objects without updating that cached attribute. After a reset, this JSON will report the old settings instead of the ones actually running. Please either refresh self.rapidsmpf_options in each _reset() or serialize from the same freshly resolved options object used by the reset path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@python/cudf_polars/cudf_polars/experimental/benchmarks/utils_new_frontends.py`
around lines 597 - 601, The serialized RapidsMPF options are stale because
serialize() reads engine.rapidsmpf_options while the frontend _reset() path
rebuilds the live Context from new Options without updating that cached
attribute; fix by ensuring the source of truth is consistent — either update
self.rapidsmpf_options inside every frontend _reset() (the methods that rebuild
Context/Options) to reflect the newly resolved Options, or change serialize() to
read/serialize from the same freshly resolved Options object used in the reset
path (the local variable passed into Context rebuild) instead of
engine.rapidsmpf_options; reference methods: serialize(), _reset(),
engine.rapidsmpf_options, self.rapidsmpf_options, Context, Options.
| _finalize_benchmark_run( | ||
| args, run_config, validation_failures, query_failures, engine=engine | ||
| ) |
There was a problem hiding this comment.
Serialize the Ray engine before leaving the context manager.
This now runs after with RayEngine(...) exits, so shutdown() has already torn down the engine state that RunConfig.serialize(engine=engine) reads. That means the new config recording can emit empty/default data or fail instead of capturing the live runtime settings. Move _finalize_benchmark_run(...) back inside the with block.
Suggested fix
with RayEngine(
rapidsmpf_options=run_config.streaming_options.to_rapidsmpf_options(),
executor_options=executor_options,
engine_options=engine_options,
ray_init_options=ray_init_options,
) as engine:
run_config = dataclasses.replace(run_config, n_workers=engine.nranks)
records, plans, validation_failures, query_failures = _run_query_loop(
benchmark,
args,
run_config,
engine,
numeric_type,
date_type,
validation_files,
)
run_config = dataclasses.replace(run_config, records=dict(records), plans=plans)
run_config = _consolidate_logs(run_config, engine=engine)
-
- _finalize_benchmark_run(
- args, run_config, validation_failures, query_failures, engine=engine
- )
+ _finalize_benchmark_run(
+ args, run_config, validation_failures, query_failures, engine=engine
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@python/cudf_polars/cudf_polars/experimental/benchmarks/utils_new_frontends.py`
around lines 1201 - 1203, The call to _finalize_benchmark_run(...) is happening
after exiting the with RayEngine(...) block so engine shutdown has already
occurred; move the _finalize_benchmark_run(args, run_config,
validation_failures, query_failures, engine=engine) invocation back inside the
with RayEngine(...) context so RunConfig.serialize(engine=engine) (and any
engine-dependent recording) runs while the engine is still live and before
shutdown.
| _finalize_benchmark_run( | ||
| args, run_config, validation_failures, query_failures, engine=engine | ||
| ) |
There was a problem hiding this comment.
Finalize the Dask run while the engine is still alive.
By the time this call executes, the with DaskEngine(...) block has already exited, so the engine has been shut down before run_config.serialize(engine=engine) reads from it. That defeats the new config-capture path for Dask runs. Call _finalize_benchmark_run(...) inside the with block instead.
Suggested fix
try:
with DaskEngine(
rapidsmpf_options=run_config.streaming_options.to_rapidsmpf_options(),
executor_options=executor_options,
engine_options=engine_options,
dask_client=dask_client,
) as engine:
run_config = dataclasses.replace(run_config, n_workers=engine.nranks)
records, plans, validation_failures, query_failures = _run_query_loop(
benchmark,
args,
run_config,
engine,
numeric_type,
date_type,
validation_files,
)
run_config = dataclasses.replace(
run_config, records=dict(records), plans=plans
)
run_config = _consolidate_logs(run_config, engine)
+ _finalize_benchmark_run(
+ args, run_config, validation_failures, query_failures, engine=engine
+ )
finally:
if dask_client is not None:
dask_client.close()
- _finalize_benchmark_run(
- args, run_config, validation_failures, query_failures, engine=engine
- )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@python/cudf_polars/cudf_polars/experimental/benchmarks/utils_new_frontends.py`
around lines 1263 - 1265, The call to _finalize_benchmark_run(...) is happening
after the DaskEngine context has exited so the engine is already shut down when
run_config.serialize(engine=engine) needs it; move the
_finalize_benchmark_run(args, run_config, validation_failures, query_failures,
engine=engine) invocation so it executes inside the with DaskEngine(...) block
(i.e., before the context manager exits) so run_config.serialize can access the
live engine during Dask runs.
Description
This updates the cudf-polars benchmarks CLI to serialize the engine configuration. This will let us see what options were actually used.